Speech Localisation in a Multitalker Mixture by Humans and Machines
نویسندگان
چکیده
Speech localisation in multitalker mixtures is affected by the listener’s expectations about the spatial arrangement of the sound sources. This effect was investigated via experiments with human listeners and a machine system, in which the task was to localise a female-voice target among four spatially distributed male-voice maskers. Two configurations were used: either the masker locations were fixed or the locations varied from trial-totrial. The machine system uses deep neural networks (DNNs) to learn the relationship between binaural cues and source azimuth, and exploits top-down knowledge about the spectral characteristics of the target source. Performance was examined in both anechoic and reverberant conditions. Our experiments show that the machine system outperformed listeners in some conditions. Both the machine and listeners were able to make use of a priori knowledge about the spatial configuration of the sources, but the effect for headphone listening was smaller than that previously reported for listening in a real room.
منابع مشابه
Recognizing the Emotional State Changes in Human Utterance by a Learning Statistical Method based on Gaussian Mixture Model
Speech is one of the most opulent and instant methods to express emotional characteristics of human beings, which conveys the cognitive and semantic concepts among humans. In this study, a statistical-based method for emotional recognition of speech signals is proposed, and a learning approach is introduced, which is based on the statistical model to classify internal feelings of the utterance....
متن کاملDesign Considerations for Improving the Effectiveness of Multitalker Speech Displays
Although many researchers have commented on the potential of audio display technology to improve intelligibility in multitalker speech communication tasks, no consensus has been reached on how to design an “optimal” multitalker speech display. This paper reviews a set of experiments that used a consistent procedure to evaluate the impact of six different parameters on overall intelligibility in...
متن کاملA Comparative Study of Gender and Age Classification in Speech Signals
Accurate gender classification is useful in speech and speaker recognition as well as speech emotion classification, because a better performance has been reported when separate acoustic models are employed for males and females. Gender classification is also apparent in face recognition, video summarization, human-robot interaction, etc. Although gender classification is rather mature in a...
متن کاملMultitalker speech perception with ideal time-frequency segregation: effects of voice characteristics and number of talkers.
When a target voice is masked by an increasingly similar masker voice, increases in energetic masking are likely to occur due to increased spectro-temporal overlap in the competing speech waveforms. However, the impact of this increase may be obscured by informational masking effects related to the increased confusability of the target and masking utterances. In this study, the effects of targe...
متن کاملGaussian Mixture Model: A Modeling Technique for Speaker Recognition and its Component
This paper provides an overview of Gaussian Mixture Model (GMM) and its component of speech signal. During the earlier period it has been revealed that Gaussian Mixture Model is very much appropriate for voice modeling in speaker recognition system. For Speaker recognition, Gaussian mixture model is an essential appliance of statistical clustering. The task effortlessly performed by humans is n...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016